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ELEMENTARY BOUNDS ON MIXING TIMES FOR DECOMPOSABLE 

MARKOV CHAINS 

NATESH S. PILL Alt AND AARON SMITH# 


Abstract. Many finite-state reversible Markov chains can be naturally decomposed into 
“projection” and “restriction” chains. In this paper we provide bounds on the total variation 
mixing times of the original chain in terms of the mixing properties of these related chains. 
This paper is in the tradition of existing bounds on Poincare and log-Sobolev constants of 
Markov chains in terms of similar decompositions [JSTV04, MR02, MR06, MY09]. Our 
proofs are simple, relying largely on recent results relating hitting and mixing times of 
reversible Markov chains [PS13, 01il2]. We describe situations in which our results give 
substantially better bounds than those obtained by applying existing decomposition results 
and provide examples for illustration. 


1. Introduction 

In this paper, we study the rate of convergence to stationarity of an irreducible, aperiodic 
and reversible Markov chain with kernel A on a hnite state space O by decomposing into 
subsets Our main results bound the total variation mixing time of K in terms of 

the mixing times of the traces (or restrictions) Ki of K on each subset Oj C O, combined 
with some information on the mixing of K between the subsets. This latter mixing time 
is studied through the construction of a projected kernel K on {1,2,... ,n}. Although the 
details of our constructions differ, this general approach is not new: it has been the subject 
of a number of papers [MROO, JSTV04, MR02, MR06, MY09] and has been successfully 
applied to many problems (see, e.g., [DLPIO, K015]). 

Our results, like those in [MROO, JSTV04, MR02, MROO, MY09], are useful in the common 
situation that a complicated Markov chain is hard to study directly, but is composed of 
smaller pieces that are easier to study in isolation or have already been studied. Our main 
goal is to provide bounds for the mixing time that are easy to apply in a wide range of 
applications. Our main results are based on the remarkable results of [PS13, 01112], where 
the authors derived an upper bound on the mixing times of reversible Markov chains in terms 
of their hitting times. 

Our bounds are generally not comparable to earlier decomposition bounds, so we give a 
high level review of those results and explain some ways in which ours can be much better. 
Let ipi, be the mixing and relaxation times of Aj, let Tp, be the mixing and relaxation 
times of A, and let (p, 93 '’®^ be the mixing and relaxation times of A. The main results of 
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Figure 1. PinceNez Graph with m = 8 


earlier decomposition bounds all state, roughly, that 

= 0(45“' max , ( 1 , 1 ) 

l<2<n 

The main innovation of [JSTV04] allows this bound to be improved if K satishes certain 
regularity conditions. One of the consequences of Theorem 1 of [JSTV04] is that, under 
certain regularity conditions, the upper bound (1.1) can sometimes be replaced with the 
much smaller bound 

= 0(max max . (1.2) 

Unfortunately, many Markov chains of interest that in fact satisfy (1.2) for a natural decom¬ 
position do not satisfy the regularity conditions given as sufficient conditions in [JSTV04] 
(see, e.g., the interacting particle systems we study in [PS16]). Some of the main con¬ 
sequences of the results in this paper are new sufficient conditions under which a bound 
similar to (1.1) can be replaced by a stronger bound similar to (1.2), and which can be used 
to obtain stronger bounds on the mixing times of interacting particle systems and other 
Markov chains. Although we give this overview in terms of relaxation times, all of our main 
result bounds the mixing time of K in terms of the mixing times of Ki and the occupation 
times of the original Markov chain on the sets Gj, rather than the associated relaxation 
times. 

Our new sufficient conditions can hold when those in [JSTV04] do not, and the new 
cases that we cover include some important examples. A simple illustrative example is the 
symmetric random walk on 2m-vertex “Pince-nez” graph studied in Section 4.1 of [JSTV04]. 
The 2m-vertex Pince-nez graph consists of two copies of the connected by a single edge. 
The graph is pictured in Figure 1 with m = 8. Consider a random walk on the 2m-vertex 
Pince-nez graph which moves to a neighboring vertex with probability c = 0(1). It is natural 
to partition this graph into two copies of Z^. Using this partition, the authors in [JSTV04] 
showed that the relaxation time of the random walk is O(m^), which implies a bound of 
0(m^log(m)) on its mixing time. To our knowledge, no earlier works on decomposition 
bounds will give a bound better than O(m^) on the mixing time for this example. Our 
bounds can be used to show (see Section 5.1) that its mixing time (and thus its relaxation 
time) is O(m^), which is indeed the correct order. 

Of course, this “Pince-nez” example is simple, and its mixing time can be derived using 
direct arguments such as coupling. We emphasize this example because it has two traits that 
are typical of chains and partitions for which our approach can improve on previous results: 
there are only a small number of ‘important’ parts of the partition (he., n is small compared 
to and and the exit probabilities K{x,fl 2 ) from one part of the partition to 



another are very far from uniform in the initial point x G hli. There are many interesting 
examples of Markov chains having these traits. 

The motivation behind this paper was the study of interacting particle systems. Using 
the main results of this paper, we were able to resolve a conjecture of David Aldous on the 
mixing time of a ‘constrained’ Ising process on the lattice [PS16], up to a logarithmic factor. 
The kinetically constrained Ising process (KCIP) is described carefully in Example 5.1, and 
Example 5.2 describes a toy version of the KCIP that illustrates the key difference between 
our bound and that in [JSTV04]. We expect our bounds to be helpful in the study of other 
interacting particle systems with a varying number of particles. To explore this, in Section 
5.2.1 we construct a large class of interacting particle systems, show that the bounds in 
[JSTV04] cannot generally be tight, and explain why our bounds can be. 

Our main application in Section 4.2 gives another situation in which our bound is roughly 
of the form (1.2) while the bounds in [JSTV04] are roughly of the form (1.1). For both the 
KCIP and the family of examples in Section 4.2, our bounds allow us to recover the correct 
mixing time up to a factor that is logarithmic in the problem size, while the bounds from 
[JSTV04] are off by a factor that is polynomial in the problem size and can be close to the 
square of the correct mixing time. 

Our results can improve upon earlier bounds in other interesting situations, and can be 
worse than earlier bounds in others. See Section 1.4 for a brief overview, and Inequality 
(2.18) for a bound that is most visually similar to earlier decomposition bounds. 

1.1. Paper Overview. After giving initial notation in Sections 1.2, 1.3 below, we give a 
‘user’s guide’ to the paper in Section 1.4. One of the main attractions of decomposition 
bounds such as those in [JSTV04] is their ease of use: you can ‘plug in’ estimates of certain 
familiar quantities related to the kernels Ki and K, such as their relaxation times, to obtain 
estimates for the kernel K. Our bounds are often similarly easy to use. However, the bounds 
in the different sections of our paper are written in terms of different quantities, and several 
of our intermediate bounds are written in terms of quantities that may not be familiar to 
the reader. Thus, it may not be immediately obvious which bounds are most relevant for a 
given problem. Our ‘user’s guide’ is designed to resolve this difficulty, allowing the reader 
to quickly obtain bounds that are written entirely in terms of familiar quantities, such as a 
mixing time or Lyapunov function. For several situations, the user’s guide describes why our 
bounds may improve upon earlier bounds, refers to the most relevant and simplest bounds 
in our paper, and also refers to a prototypical worked example. 

In Section 2, we give our main lemmas and apply them to obtain an initial result that is 
visually similar to earlier decomposition bounds. In Section 3 we introduce the notion of the 
well-covering time as well as a comparison theory for well-covering times; these are used to 
obtain decomposition bounds that are easier to compute than those in Section 2. In Section 4 
we give much stronger decomposition bounds under certain regularity conditions and apply 
them. Finally, Section 5 contains two applications. Auxiliary results and derivations are 
deferred to an Appendix. 

1.2. Basic Notation. We write N = {0,1,2,...}. For any m > 1, dehne [m] = {1, 2, • • • , m}. 
For two positive functions /, g on N, we write f = 0{g) for sup^ < cxo, we write f{m) = 
Q{g) if both / = 0{g) and g = 0(/), and we write / = o{g) for limsup^^^ ^ = 0. For a 

sequence of positive random variables {Xm}m>o and a sequence of positive integers or random 
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variables {Yfn}m>o, we say that Xm = 0(Ym) (or Xm = 0(Ym,) with high probability) if for all 
e > 0 there exists a constant (7 = (^(e) < cx) so that limsup^^o^ P[Xm > CYm] < e. In both 
cases, we sometimes write “/ is at least on the order of to meant g = 0{f). We say that 
a random variable X stochastically dominates a random variable Y if P[X > s] > P[y > s] 
for all s G M. The letters C, Ci etc. will denote generic constants whose value may change 
from one occurence to the next but are independent of the problem size or n, the number of 
partitions. 

For a monotonely increasing (but not necessarily injective) function / : N i—)■ 

f~^ix) = mm{y : f{y) > x}. 

For two distributions y,i> on a hnite state space the or total variation, distance 
between y and v is given by 

\\y - i/||tv = rnax(/r(A) - v{A)). 

The mixing profile of a Markov chain on with stationary measure tt is dehned as 

r(e) = min{f > 0 : max ||£(Xi) — ttHtv < 

Xo=x£0 

for all 0 < e < 1. As usual, the mixing time is dehned by Tmix = r(0.25). The dependence 
of a Markov chain on the initial conditions Xq = a; is denoted by a subscript, e.g., 

E,[-] =E[-|Xo = a;]. 

1.3. Projections and Restrictions of Markov Kernels. Let be an irreducible, 

reversible and |-lazy Markov chain with kernel K and stationary distribution vr on Our 
goal is to bound the mixing time of {X^jigp} in terms of the mixing times of various restricted 
and projected chains. Fix n G N and let O = be a partition of O into n disjoint parts. 

Dehne the projection function P on O by 

V{x) = {1 < i < n : X G 0*}. (1.3) 

For 1 < i < u, set gfiO) = —1. Then, for s G N, recursively dehne the sequence of hitting 
and occupation times 

gfis) = min{f > gfis - 1) : Xf G O*}, 

S /1 d \ 

Ki{s) = max{M : gfiu) < s} = ^ lx„eOi- 

U=\ 

Both g, n depend on the initial condition Xq. We also dehne the associated restricted pro- 
cesses {.YPjigsi by 

= (1.5) 

This is also called the trace of {Xt}tgf^ on flj. Since {Xtjtgpi is recurrent, we have for all 
f G N that gfit) < oo almost surely, and so xj:^^ is almost surely well-dehned for all f G M. 
The trace {X^^*^}igN is a Markov chain on flj, and we denote by Ki the associated transition 
kernel on flj. The kernel Ki inherits irreducibility, reversibility and ^-laziness from and 

^ The transition kernel K[ corresponding to the notion of a restricted chain defined in [JSTV04, Section 
2] satisfies Kfix^y) < Kfix^y) for all x y. This means that our restricted chain is generally more rapidly 
mixing (e.g., Ki dominates K'i in the sense of [Tie98]), and it can be much more rapidly mixing {e.g.^ when 
K is ergodic, Ki is always ergodic and has a smaller mixing time than K, while K^ may not even be ergodic). 
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its stationary distribution is given by TTi{A) = -^0^ for all A C ff*. Let (pi be the mixing 
time of Ki and set 

(^max = max ipi. (1.6) 


We next define the projected kernel K. The state space of the projected chain is the set 
[n] and its transition kernel is dehned by 

1 


K{iJ) = 




^ Tr{x)K{x,y). 


(1.7) 


x£Qi,y£Qj 


Throughout the paper, we are often interested in hitting times of various sets. We recall 
that, if {W}f>o is a Markov chain with state space and A C fl, the hitting time = 
min{t > 0 : Xi G A} satishes 

maxFx[TA_ > kt] < (maxPa,[ryi > (1.8) 

for all fc, t G N. In particular, this implies 

maxP^[e“^-——^ > t] < e“h (1.9) 

xen maXy^nEy[TA\ 

See inequality (5.12) in Appendix A for a short proof of (1.9). We use this fact throughout 
the paper, sometimes referring to the ‘subgeometric tails’ of the hitting time distribution. 


1.4. User’s Guide. We give an overview of our results, with the aim of directing a reader 
to the most relevant bounds. Briefly, Lemma 2.2 gives the strongest bounds in this paper, 
while Inequality (3.10) gives mixing time bounds that are easiest to compute in terms of 
standard estimates. 

We now describe several common situations in which our bounds can improve on those in 
the literature: 


• Situation: n is small while 


maxi<i<„ 


max^gn. Ey^n. K{x,y) 
min^reni Ey^n. K{x,y) 


is large. 


— Improvement: Improves the mixing bound from a large function of tpi {e.g. 
roughly 0(jp maxi<j<„ ipi)) to a smaller function {e.g. roughly 0(max(^, maxi<j<„ (fi))), 
at the cost of a poor dependance on the number of parts n of the partition. 

— Relevant Results: Lemma 2.6 gives this improvement in the simplest situa¬ 
tions; Lemma 2.1 gives this improvement for more difficult examples. Results in 
Section 4.1 may be necessary if one has useful bounds only on m.?rXa<i<bP>i for 
some (a, h) ^ (1, n). 

— Worked Examples: Example 5.1 is the simplest example illustrating this im¬ 
provement. Example 5.2 obtains a similar improvement in a more complicated 
situation, using the bounds in Section 4.1. 

— Requires: Bounds on the mixing times (ft, the expected escape times for flj, 
and lower bounds on the elements K. 

• Situation: As above, but n is large and K exhibits the following approximate 
metastability: there exists some 1 < fc < n so that the mixing time ipi of is 
small compared to the escape times min^-gQ. E[min{t : G | Xq = x}] 

for all 1 < i < fc. 

— Improvement: As above. 
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— Relevant Results: The main result is Theorem 4; as above, results in Section 
4.1 could be helpful. 

— Worked Examples: Example 4.3. 

— Requires: Bounds on the mixing times (p*, the expected escape times for hlj, 
and a contraction or mixing condition for a projected chain similar to K. 

• Situation: A bound on the mixing time is needed, but min^, 7r(x) is very small. 

— Improvement: Recall that the mixing time ip and relaxation time (p'’®* of a 
Markov chain satisfy 

< if = 0 (—log(min7 r(a;))); 

X 

both inequalities are sharp. Thus, bounding the mixing time of a Markov chain 
directly can give an improvement of a factor of — log(min 3 ,7r(x)) over a bound 
that passes through the relaxation time. This factor can be arbitrarily large, 
and is particularly important for limiting arguments. 

— Relevant Results: All results in this paper can deliver this type of improve¬ 
ment. If the very small value of mina;7r(a:) is the main obstacle to using an 
existing decomposition bound, we recommend beginning with Section 3, and in 
particular the relatively weak but simple Inequality (3.10). We note that many 
bounds in Section 3 are stated in terms of a somewhat complicated quantity we 
call the well-covering number. As discussed in the introduction to that section, 
a collection of comparison inequalities allow these bounds to be used without 
the explicit computation of new well-covering numbers. 

— Worked Examples: Inequality (3.10); can be applied to other examples {e.g. 
Section 5.1). 

— Requires: Inequality (3.10) requires estimates of the mixing times (pi for some 
values of i as well as entry-wise lower bounds on K. 

2. Main Result: General Mixing Bounds eor Decomposable Markov Chains 

Let {W}teN be an irreducible, |-lazy, reversible Markov chain on a finite state space D 
started at Xq = z. For A C D, dehne 

ta = min{t : G A} (2.1) 

to be the first hitting time of A. Theorem 1.1 of [PS13] yields that, for 0 < a < |, there 
exist universal constants and c'„ such that ^ 

c’„ max < Umix < c„ max ^^{ta). (2.2) 

z,A:7r(A)>a z,A\n{A)'>a 

Thus (2.2) relates mixing times to hitting times of large sets. It is not hard to show > 
c'^Yiia:yiz^A:-K(A)>a^x{TA)- The key inequality is the upper bound in (2.2). As discussed in 
[PS13], this upper bound does not hold for non-reversible chains. For the rest of the paper, 
Cq, and c'^ will refer to the constants in Equation (2.2). 

For A C D, let Aj = A n 12^. Following (2.1), let be the hrst hitting time of Aj for the 

trace xhe following simple bound on the mixing time of A, based on (2.2), forms 

the basis of our approach: 

^The version of Theorem 1.1 in [PS13] and their constants Ca,c'^ are slightly different from ours. See 
Appendix A for a proof of Equation (2.2). 
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Lemma 2.1 (Basic Mixing Bound). Fix 0 < a < | and 1 — a < (3 < 1. Let 0 < 7 < 


mm 


(|, and fix some I C {1, 2,..., n} that satisfies 




Define 


i€l 


T = min <; T : min max 

Q<t<T i&I \C. 


^ + maxPjKj(T) < t]) < 
C'J V 


(2.3) 


Then the mixing time Tmix of {XtjieN satisfies 


Tmix ^ 2^0,7”. 

Proof. Fix a set A C 12 with ti^A) > a. We will denote Aj = A fl 12j for all 1 < i < n. We 
claim that there exists j G / so that 

T^j{Aj) > 7 > 0. (2.4) 

To see this, assume that inequality (2.4) is not satisfied for any i ^ I. Then we would have 

7r(4l) = 'K{Af) + Y 


10 


i&I 


<(^-Y ^(^*))+^Y 

i£l ii 

= 1 - (1 


iei 


i€l 

< 1 - (1 - 7)/? < a, 

contradicting the assumption that 7i{A) > a. Thus, inequality (2.4) is satisfied. 

Let j G / be an index satisfying HjlAj) > 7 . For any T G M and 0 < f < T, we have 

{ta <T}d {ta^ <T} = < Kj{T)} D < f} n {Kj{T) > t}. 

This gives 

FfiTA >T]< FfixAj > Hj{T)] 


< inaxP 2 ^[rJ,^ > t] + P^[kj(T) < t] 

yGilj 

-Uh 


^y[rX’] 

< max- - —h Fz[Kj(T) < t] 

y&Vij t 

<^ + P,[fc,(T) < (|, 

where the last inequality follows from Equation ( 2 . 2 ). Thus, for T >T, 

maxP^lr^ > T] < -. 
zm ^ ^ ~ 4 

Since this holds for all z G 12 and all A with 7r(A) > a, by Equation (1.8) we have 

4 

max Fz[ta] < xT. 
z^Q,,'k{A)'>ol 3 
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By Equation (2.2), 


completing the proof. 


Tmix < C« max E^[ta\ < -CaT, 
zGn,n{A)>a 3 


□ 


Our next result weakens the requirement in Lemma 2.1 that meiX^^nEz[Ki{T) < t] must 
be small for all i G / to the requirement that only max^go P^[njgj-{/s;j(T) < t}] needs to 
be small, at the cost of requiring the bound be uniform over all I C [n] with 7r(Ujg7f2i) 
sufficiently large. Dehne 

T' = min|T : min ( max (maxP^lnjg/lKhT) < t}] + < -|. 

I o<t<r V/c[n]:7r(Uig,np>f ^ ZGO ^ ^ ^ ^ ^^ >) qj 

is/ 

(2.5) 

In the proof of the following lemma, we bound the distribution of the number of steps in 
an excursion from a given set A by the distribution of the hitting time to A from the worst 
possible starting point in 


Lemma 2.2. Fix 0 < a < | and let T' he as in (2.5). Then the mixing time Tmix of {Xt}t,=n 
satisfies 


T ■ <-cT' 

' mix 2 ^a. • • 


Proof. The proof is similar to that of Lemma 2.1. Fix a set A C with vr(y4) > a. Dehne 
the set 


Ja = e [n] 


n A) ^ a 

7r(Di) “ 2 


We claim that 

, ^ . a a , , 

7r(Uigj^Di) > (2.6) 

To see this, assume that 7r(Uigj^Di) = p < Then 


a < ^{A) 

= 7r(Di n A) 


= ^7r(Di) 
i&JA 


7r(Di n A) 
Tlifili) 


vr(a) 

A 


7r(Di n A) 

nifili) 


i^JA A 


= P+ |(l - p) < 


a, 


(2.7) 


yielding a contradiction. The hnal inequality of (2.7) follows from the fact that ^+^(1—p) < 
a for p < 2 ^- Thus (2) holds. 






The goal is to bound ta by For all T G N and z & fl, 

¥^[ta >T] = P^[ > Ki{T)}] 


- ^ +^A^i&JA{nl > t}]) 

I ^ I 

< min (maxPa, [ Rig{«i(F) < 0] + e " ) 

iGJA 

C Q^ t 

- < ^}] + 52 

i£jA 

where the penultimate inequality follows from inequality (1.9) and the last inequality follows 
from inequality (2.2). Since 7r(Ujgj^r2j) > f , for T > v, we have 

r n 1 

maxP^ Tyi > T < -. 
zen '■ ■' 4 

Since this holds for all A with 7r(y4) > a, we have by equation (1.8) 


By Equation (2.2), 


max E^[ta] < 77 T. 

z^Q.^'K{A)>(y. o 


^mix < Ca max E^[ta] < -CaV , 

Z^Q,^'k{A)'>Ol O 


completing the proof. □ 

The following example shows that Lemma 2.2 can be much stronger than Lemma 2.1: 


Example 2.3. Fix 3 < d G N, let {Gm}m>d+i be a sequence of d-regular expander graphs 
with \Gm\ = m (see, e.g., Theorem 4.16 of [HLW06] for proof that such a sequence exists). 
Let Qm be the kernel associated with |-lazy simple random walk on Gm- By Theorem 3.2 
of [HLW06], the mixing time of Qm is 0(log(m)). Fix a sequence e = < min(l, 

let Hm = {(h"?^) : i G {l)2},u G Gm} and define the kernel Km by setting 

Kmiil,u),{l,v)) = Qmiu,v), ( (1 , u), ( 2 , u)) = ^, iF^((2,M), (1 ,m)) = e, 

setting Km{x, y) = 0 for all y 7 ^ x not of the form listed above, and hnally setting Km{x, x) = 
1 — J2y^x Let 71 denote the stationary measure of Km and Tmu denote its 

mixing time. 

For each m G N, we consider the partition hi = Uu^Cm^u of hi = Hm into the m sets 
Qu = {(l^n), (2,-u)}. We will show that Lemma 2.1 cannot obtain any bound on the mixing 
time stronger than Tmix = 0{m), while Lemma 2.2 can be used to obtain the correct bound 
of Tmix = 0{e~^ log(m)). When log(m) < e~^ -C y^^m) ’ ^ substantial difference. 

By symmetry, 7r(r2u) = A for all u G Gm- Fix some a = 1. Then the set / used in the 
statement of Lemma 2.1 must be of size at least Since the restriction of Km to Qu has 
only two points, the mixing time can be computed explicitly; it is 0(1). Lemma 2.1 
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requires that for some t < T (see Equation (2.3)), both ^7^ < \ and P[Ki(T) < t] < j 

for all i E I. If P[Kj(T) < ^] < ^ holds for all i E I, the pigeonhole principle implies that 
we must have T > C\I\t > C'mt for some universal constants C,C' > 0. Thus, applying 
Lemma 2.1 cannot yield a better upper bound than Tmix = 0(T) where m = 0(T). 

We briefly sketch an argument showing that is possible to obtain a much better upper 
bound via Lemma 2.2. Again, set a = | and £x J C Gm with |J| > Let 

be a walk evolving on 12 according to K^n, dehne ffiower = {(Ij'*^) ^ u E Gm}- Let 
be the trace of on ffiower- Identifying elements of lliower with points of Gm by the 

map (1 ,m) n, it can be verihed that is a Markov chain with transition kernel Qm- 

Since the mixing time of Qm is 0(log(m)), Equation (2.2) implies that the expected hitting 
time for {Tt}t>o of any subset / C Gm of size |/| > ^ is 0(log(m)), uniformly in the starting 
vertex Yq. Let 

t'j = min{f > 0 : Yt E 

As noted earlier, the mixing time of Qm is 0(log(m)). Thus by (2.2), there exists some 
constant 0 < C < cxd that does not depend on Yq = y, m or the particular set / so that 

Ey[rj] < Clog(m). (2.8) 

Next, let 

Tj miU’l^f ^ 0 . E fl fljower} • 

Set i)(0) = —1, inductively dehne i)(s + 1) = min{t > r)(s) : E Slower}, and set k{s) = 

max{M : ?)(«) < s}. We then have 

t} = k(r/). (2.9) 

Next, {77(5 + 1) — 7)(s)}seN is an i.i.d. sequence with mean 0{e~^). Thus, there exists a 
constant Ci > 0 so that 

maxP3,[k(C'iAi log(m)e“^) < Ai log(m)] < (2-10) 

for all Ai > 0. Combining inequalities (2.8), (2.9) and (2.10) gives 

Tj = 0(log(m)e"^), (2.11) 

uniformly in Xq = x E fl. 

Using again the observation that r)(s + 1) — 77(5) is a sequence of i.i.d. random variables, 
each a sum of geometric random variables and with mean that is 0(e“^), we have for all 
A 2 > 0 , 

minP,,[{W}[i+C > ^'(l - > G"e-^^ (2.12) 

for some 0 < C, C", G” < 00 that do not depend on e or m. Recall that Kj(-) denotes on the 
occupation time of Xt on 12*. Since (pmax = ©(I); combining inequalities (2.12) and (2.11), 
we infer that there exists a constant G 2 such that 

8 1 

maxPjn*g7{Ki(C2log(777)e"^) < [ —V5maxlog(777)]}] < -. (2.13) 

z£Vl Cl O 
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By choosing t = r-f-(^max log(m)l and T = C 2 log(m)e ^ with C 2 sufficiently large, we obtain 

t 

max + maxF,[r\i^j{Ki{T) <t}] < - + - = 

/CG^:7r(UigjOi)>^ 7^ 8 8 4 

From Equation (2.5) and Lemma 2.2, it follows that Tmw = 0(log(m)e“^), which indeed is 
the correct mixing time. For e~^ -C m, this is much better than any bound obtainable by 
applying Lemma 2.1. 

Remark 2.4. The partition used in Example 2.3 was = {(1 ,m), {2,u)},u E Gm- A more 
natural partition is Hm = Slower U Lemma 2.1 applied to this partition indeed gives 

the optimal bound of 0{\og{m)e~^) for the mixing time of K^, without the restriction that 
e < However, our main point behind Example 2.3 is not about choosing partitions, 

but to illustrate that for a given partition. Lemmas 2.1 and 2.2 can give completely different 
answers. 

Fix 0 < a < I and dehne 

^hit = , max ^ maxE,[ru,,,oJ. (2.14) 

We give a corollary to Lemma 2.2. Fix i E [n] and Xq = x E H*. Dehne the escape time 

Ti^esc = min{f > 0 : Xt ^ Dj}. (2.15) 

Corollary 1 . Assume that, for some e, 5 > 0, 

min min Pa; [ri,esc > ev^max] > 5. (2.16) 

i^[n] tcGQi 

Following the notation of Lemma 2.2 and Eguation (2.14), we have 

T-mix = 0{e~^6~^Tp^^^n log(n)). 

Proof. Let / C [n], A = Uje/Di satisfy vr(H) > | and let ki{s), rjj^s) be dehned as in 
Equation (1.4) with Dj replaced by Ujg/Dj. Let {Zj}jgp^ be a sequence of i.i.d. geometric 
variables with F[Zi > 1] = e~^ and let {Z'jjgN be a sequence of i.i.d. Bernoulli random 
variables with F[Z[ = 1] = 1 - F[Z[ = 0] = <5. 

We make two observations. The random variable e</9max Zj stochastically dominates the 
return times (rjiij) — rji^j — 1)) conditional on The random variable Zl is stochas¬ 
tically dominated by the random variable l^:^) conditional on where 

eu) = {Li.an.}. 

Thus £ denotes the event that the j’th visit to Ujg/Dj to be of length at least e(pmax. These 
two observations give, for any 0 < f < T G N and S > 1, 

p(maxfi:i(T -f- < P[k/(T -f- e</9max) < t] (2.17) 

V i€l |i I / 

S S 

< nY,Mj) - viU - 1)) > r] + P[E Mi, < 

i=l i=l 
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max 






t 


eip. 


i=l i=l 

where the last inequality follows from Equation (1.9) and Equation (2.16). 

Fix 0 < Cl < oo and set t = Ci^praaxn\og{n). For any such choice of Ci, there exists a 
0 < C 2 < oo so that for S > C 2 Cie~^ 6 ~^n\og{n), 

^ f 1 

* " e^raJ - 16 ’ 

and for any such choice of 0 < Ci,C '2 < 00 , there exists a 0 < C 3 < cxd so that for 
T > CiC2C3e"^(5"Vhit^log(’^), 

s 


[e V^hit 




i=l 


16‘ 


□ 


Combining these two bounds with inequality (2.17) gives, for these choices of f, T, 

P[maxfi;i(T) < C'i(^maxlog(n)] < 

The result now follows from Lemma 2.2. 

Remark 2.5. Taking e~^ = (pma^ and 5 = 1 in Corollary 1 gives 

Tmix = C(^hit 7 ’max?^log(n)), ( 2 . 18 ) 

which is visually similar to the main decomposition bounds in [JSTV04] and other papers 
cited in the Introduction. 

The following bound on gives an easy way to use Corollary 1 when n is small; 
Lemma 2.6. Follow the notation of Corollary 1. Assume also 


maxmaxPj;[ri,esc > ev^max] < 1-5 

iG[n] x£Qi 


(2.19) 


for some e, 5 > 0. For c> 0, let Gc be the directed graph with vertices 14 = fl and edges 

Ec = {{i,j) ■ minP^[X^^ e > c}. 

xGUi 


Let D be the diameter of Gc- Then 


Thit < max D{c5) 


-D 


Proof. Fix / C [n] satisfying 7 r(Ujg/f 2 j) > f and let A = Ujg/flj. Equation (2.19) and the 
dehnition of Gc immediately give 


minP,,[rA < ecpcna^D] > (c5) 

x£U, 


D 


By Equation (1.8), this implies 

OO OO 

rX13jX ]E^ rX13j!X ^ ^ rXlSjX^ '' ^ maxf^] 

xeO rrf=r/ ^ ^ rrf=r) ’ ^ 


t=0 


A:=0 
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< e(^max-D y^(maxP2;[rA > e(^max-D])^ < e^J^ax-D y^(l - {c5)^)^ = e(^max-D(c5) 

k={) k=0 

Since this holds for all sets / C [n] satisfying 7r(Ujg/ffi) > this completes the proof. □ 

Lemmas 2.1 and 2.2 allow ns to bonnd the mixing time of in terms of the 

mixing times of the traces of as well as the occnpation times The 

assnmption that we have good bounds on <pj is similar to the assumption of a good bound 
on the relaxation times of restricted chains in [JSTV04], and it often holds. However, the 
occupation times are much more difficult to understand than the relaxation times of the 
projected chain used in [JSTV04]. The remainder of this paper is devoted to building tools 
for bounding these occupation times. 


3. Mixing Bounds Via Well-Covering Times 

The mixing bounds obtained in this section are based on the following observations: 

(1) If the occupation measure Ki{T) is large relative to (pi for some set Hj, with high 
probability it will also be large for ‘neighboring’ states VLj with K{i,j) large. 

(2) If the high-probability event described above holds for all pairs i,j, then every set 
with large stationary measure will also have large occupation measure. 

We begin our development by dehning a quantity that we call the well-covering time that 
allows us to make this observation more precise. 

Readers interested in quickly obtaining reasonable bounds may skip this dehnition on a 
hrst reading of the paper: the comparison theory for well-covering times developed in Section 
3.4, combined with the bound on the well-covering time of a simple example computed in 
Section 3.2, allows users to estimate well-covering times without computing any directly. 
Thus, the main result of this section (Theorem 2) can be used in simple examples without 
using this dehnition. For more complicated examples, we still suggest estimating the well¬ 
covering times directly only for simple examples as in Section 3.2, then using the comparison 
bounds to relate these to the Markov chain of interest. 


3.1. Well-Covering Times. Fix n G N and let Q be a reversible, irreducible, aperiodic 
transition kernel on [n] with stationary measure p. Fix constants 0 < . .. ,tn < oo 

and dehne the set Sb,t to be the pairs k = (n( 1),-- - ,N(n)) G [0,1]”, N = [V(hj)] G 
[0,1]” X [0,1]” satisfying ^(i) = 1; J2ij ^ihj) = 1 and the inequalities: 


\N{i,j) - K{i)Q{i,j)\, \N{iJ) - n{j)Q{j,i)\ < 
-k{ 3)\, -K{i)\ < 

i 3 


(3.1) 


The terms k and N in (3.1) represent rescaled counts of the empirical occupation measures 
and transition counts for a Markov chain with transition kernel K and projected kernel 
K = Q. For hxed constant B, the set Sb,t represents all plausible joint values of k, N over a 
run of the Markov chain for T steps. For example, while it is possible to have k, concentrated 
on one part Hi of the partition of a Markov chain for a time T 3> tpi -^(1; j); fhis event 
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is extremely unlikely and so for fixed B, the pair k, = (1,0,...), N = 0 will not be in Sb,t 
for T large. 

Sb,t is of interest for T large. In the limit T = oo, the set of Equations (3.1) yield 

N{i,j) = K{i)Q{i,j) = K{j)Q{j,i) (3.2) 

and 

= K{i). (3.3) 

* j 

Summing (3.2) with respect to the index i and using (3.3) gives 

= '^N{iJ) = k{j). 

i i 

Thus the probability vector k satishes k = kQ. Since Q is irreducible and aperiodic, it 
follows that K = /i when T = oo. 

Definition 3.1 (Well-Covering Time). The well-covering time r„c = Twc(ti, ■ ■ ■ ,tn,B) asso¬ 
ciated with the kernel Q and the associated set Sb,t satisfying (3.1) is defined as 

Twc = min |t > 0 : V (k, iV) G Sb,t, Vi G [n], nfi) > ^| • 

This dehnition might seem slightly unwieldy at hrst glance. We give more intuition on 
this quantity in the following sections. 

3.2. Well-Covering Times for the simple Random Walk on a Graph. We compute 
the well-covering time of a simple random walk on a graph. The calculations in this section 
can be combined with the comparison results in Section 3.4 to obtain crude but useful bounds 
on the well-covering numbers of a much broader collection of Markov chains. 

Let G = (E, E) be a tree with \ V\ = n vertices, maximum degree less than A and diameter 
D. Let Q be the transition kernel on state space [n] given by 

Qifij) = 

for i 7 ^ j and Q{i,i) = 1 - EjyiQ(Lj)- 

This is a kernel associated with simple random walk on G, and its stationary distribution 
is fi{i) = ^ for all i E V. 

Lemma 3.2. Let Twc be the well-covering time associated with the kernel Q. For any > 0, 

, (j), B) < ?7,max(lO^A^i?^D^,40). (3.4) 

Proof. Let T > n max(10^A^i?^D^, 40) and let (k, iV) G Sb,t- By the pigeonhole principle, 
there exist a vertex i such that nfi) > For u,v E G, denote by \u — n| the graph distance 
on G, that is, the length of the shortest path in G from u to v. By induction on the quantity 
N ~ il) J ^ 'WG will show that 


16 BA\/n 

^(j) > K{i)e~ '/T 


1 

> —. 
An 


(3.5) 


It is clear that inequality (3.5) holds for \i — j\ = 0. To prove the inequality for all j, hx 
0 < s < D and assume that inequality (3.5) holds for all j such that \i — i\ < s; we will 
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prove that it holds for j such that \i — j\ = s. Fix j so that \i — j\ = s and also £x £ G G 
that satisfies \i — ^\ = s — 1 and |f' — j| = 1; by the dehnition of the graph distance, at least 
one such vertex exists. By inequality (3.1), 


> NiiJ) - 


Vt 
Vt • 


Combining these two inequalities. 


K(j) > K{t) - — - = K{e){l - y==^j=). 


Vt 




(3.6) 


By the induction hypothesis (3.5), 


, , , 16BAy/n, , , 

Vj) > «(^)(1- ^ ) > «(*)e • 

Since T > and j < D, this implies 

^ Tn 

proving inequality (3.5) for the case \i— j\ = s and thus completing the induction argument. 
Since T > Ancf), inequality (3.5) implies 


<J) > f 

for all j G [n], completing the proof of inequality (3.4). 


□ 


3.3. Well-Covering Time Bounds Mixing Times. For 1 < < n, dehne the hrst 

flj —>■ Qj transition time tV = Q, then define subsequent transition times by 

= min{s > : X, G G fi,}. 

Define the number of such transitions before time T by 

Nij{T) = max{f > 0 : tV < T}. 

Theorem 2. Fix ^ < 1 — a < l3 < 1, fix I C [n] so that 7r(Ujg7f2i) > /? > |, and set 
7 = min(|, "^^~^ ) > 0. Let r^c be the well-covering time associated with the projected kernel 
K. Finally, for I < i < n, set ip[ = Then for T satisfying 

T > ■ ■ ■, \/8</3maxlog(64n2T)), 

we have 


We need the following two lemmas. 
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Lemma 3.3. Fix notation as in Theorem 2. Then for all Xq = x E Q, t E N and all 
0 < c < oo, 


and 


r 1 , _ c2{t+l) 

{t))-K{i,j)\>c] <4:e 


r 1 c^(t+l) 

(«)) - I<(j,i)\ > c] < 4e . 


(3.7) 


(3.8) 


Proof of Lemma 3.3. We first prove inequality (3.7). Define the function fij : Dj —)■ [0,1] by 

A,(i/) = p[XieD,|Xo = 2/]. 

We note Hi{fi,j) = SyeOi ^i(l/)/i,i(l/) = K{i,j). By Corollary 2.11 of [PaulS], for all x e D, 
t G N and c > 0 we have 




(t+i) 


(3.9) 


s=0 


Next, denote by iC* the transition kernel associated with the Markov chain We 

may write 


-j, 


where the kernels Kij{x, •) are dehned by 

Kij{x, •) = p[x« e -IXo = x,Xi e D,-]. 

For hxed i,j, let A,i(t) = IxtsOiA+ieo,'- Thus E[Dij(t)\Xt = x E Qi] = fi,j{x), and 
Xlo< 5 <K-hP:X.,Go (-^»j('5) “ is a martingale relative to the hltration cr({xi*^}s<t). 


^ 0 < 3 ;<K"ht) :XsGOi 

By Azuma’s martingale inequality, 

P4^|A'y(K."‘(i)) > c] = Py ^ 

s=0 


t + 1 


E - ki(X.))\ > c] 


o<s<K-^{t):Xaeni 


c2(i + l) 

< 2e- 


Combining this with inequality (3.9) and using the fact that (pmux > 1, we have shown 
inequality (3.7): 

„ 1 , _ , c^(t+l) c2{t+l) e^(t + l) 

Pa;[|-W,j(^i ^(^)) — A'(q j)| > c] < 2e ®‘7’max + 2e 4 < 4e S^^max . 

f A 1 

Inequality (3.8) follows immediately by applying inequality (3.7) to the time-reversal of the 
chain and the proof is hnished. □ 

Lemma 3.4 (Local-to-Global Spreading). Fix notation as in Theorem 2. Then for any 
B > 0 and T > rwc(S(pi,..., S(p„, y^8(pmax log(^^)), 

. Ki(T) , 
max Pa; [max-< 5] < e. 

fcGO i^[n] Lpi 
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Proof. Fix T G M, Xq = x G and e > 0. For I < i ^ j < n, denote by Aij the event that 


_ 8 r? 2 T 

\N,^,{T) - K,{T)K{t,j)\ > dSif^^Jogi - )k,{T) 


and denote by Bij the event that 


_ Sn^T 

\Nij{T) - Kj{T)K{j,i)\ > \l8ipra^^log{——)Ki{T). 


By Lemma 3.3, 


^x[Aij] = F^[\Nij{T) - Ki{T)K{i,j)\ > y 8ip^^Aog{—^)Ki{T)] 

^ ^ _ I 8'n?T 

- Ki{T)K{iA)\ > y 8(^niaxlog(—^)Ki(T)|Ki(T) = t]F^[Ki{T) = t] 

8 (Prm.Aog{- 


t=0 

T 


'Sn^T-] 




t=0 


Vi 


\Ki{T) = f\Fa,[Ki{T) = t] 


X _ I 8rPT 

- “ ^(bi)l > Y 8V7maxlog(—^)] 


< 


2tP' 


Taking a union bound yields Fx[L>i<iA=j<nAij] < | and thus > 1 — f. The 

same calculation implies that Fx[P\<iA^j<nBi j\ > 1 — By construction, we also have 

^iV,,(T) - kAT)1 \Y^N^AT) - k.(T)| < 1. 


Set 


AT/- ■\ Ki{T) 

Nil,]) = — nil) = 


T 


T 


We have shown that, with at least 1 — f probability, the pair (k, N) belongs to the set Sb^t 


associated with the kernel K and constant B' = y 8(/9max log(^^^). Thus, by the dehnition 

of the well-covering time, for T > rwc(S(^i, • • •, \J 8 iprnsix log(^^^)), with at least 1 — f 

probability, R{i) > This immediately yields that 

Pa;[max ——^ < B] < -. 

i(z[n] (pj 2 

Since x G ff was arbitrary, the proof is hnished. □ 

We hnally give 

Proof of Theorem 2. By Lemma 3.4, for T > rwc(8c![,</9'^,..., 8 d^ip'.^, V^Praa.AogV'^rPT)), 

^ . kAT) , 1 

max Pa; max-< 8c' < -. 

xeo i&i Lfi ^ 8 
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Thus, by Lemma 2.1, 


and the proof is hnished. 


T 

^mix _ 2 Cq ^ 


□ 


Combining inequality (3.4) with Theorem 2, we see that if a Markov chain {XijtgN has Q 
as its projected chain, and the mixing times of all restricted chains are less than (praa.^, the 
mixing time of {XtjtgN satishes 

, , = 0(nlog(n)A2zlVmax). (3.10) 

log(rmix) 


3.4. Comparison inequalities for well-covering times. Like the spectral gap, the cov¬ 
ering time can be difficult to bound for generic Markov chains. One of the main tools for 
obtaining quantitative bounds on the spectral gap of a Markov chain is the use of compari¬ 
son theorems to relate complicated chains of interest to simpler chains that can be analyzed 
directly (see e.g., [DSC93, DGJM06]). In this section, we give some basic comparison re¬ 
sults for the well-covering time, for the same purpose. The following ‘scaling’ bounds are 
immediate: 

• For any a > 1, and any ti,..., tn, B, 

^wc(®ti, . . . , O-tn, B^j ^ Q:Twc(tl, • • • , tn, B'j. (3.11) 

• For any a > 1, 5 > 0 and T G N, 

SaB,cflT C iSb,T, 

and so for any fi,..., 

rwc(ti, ..., tn, aB) < aVwc(ti, ...,tn,B). (3.12) 

The next result shows that the well-covering time of a complicated kernel can be bounded 
in terms of the well-covering time of simpler kernels: 


Lemma 3.5. Let Q, Q' be two reversible kernels on [n] with the same stationary measure 
H, and let r^c, their well-covering times. Assume that Q{i,j) > Q'{i,j) for all j ^ i. 

Then for any sequence ti,..., tn, B, 

r„c(ti, ...,tn,B)< 9r4c(ti,..., 5). (3.13) 

Proof. We begin by showing that, under the same assumptions, 

rwc(ti, ...,tn,B)< r4c(ti, ■ ■ ■, tn, 35). (3.14) 

Let Sb^t and denote the pairs (k, N) that satisfy inequalities (3.1) for the kernels Q, Q' 
respectively. To prove inequality (3.14), it is enough to hnd, for any pair (k, N) G Sb,t that 
satishes mini ^ < y;, some pair (k', N') G rp that satishes mini 
We now give such a construction. Set 

K'{i) = K{i) 


for all 1 < i < n, set 


N'{i,j) = N{iJ) - K{{}{Q{iJ) - Q\i,j)) 
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for all 1 < i 7 ^ j < n, and finally set 

N'{i,i) = K'{i) - ^N'{i,i). 

Since minj ^ it is clear that minj Thns, it jnst remains to check that 

{k',N') G S'^rp by conhrming that they satisfy both parts of ineqnality (3.1). To check the 
hrst part of line 1 of ineqnality (3.1), 

\N'ii,j) - K'{i)Q'{i,j)\ = \N{i,j) - K{i){Q{i,j) - Q'{i,j)) - K{i)Q\i,j)\ 

= \N{i,3) - ^^ii)Q{h3)\ 

Vt ~ Vt ' 


To check the second part of line 1 of ineqnality (3.1), note that by reversibility and then the 
hrst part of ineqnality (3.1), 


Q{j,i)\K{i 


h(j) 

H{i) 


- i^U)\ = - «(j)Q(j)OI < 


2 B^J K{i) 


Thns, 


We conclnde that 




h(j) 

/i(i) 


< 


2 B^J K'{i) 

Vt 


- K'{j)Q'{j,i)\ = \N{i,j) -K{i){Q{iJ) -Q'{i,j)) - K(j)Q'(j,i)| 

= \N{id) - K{i)Q{i,3) + K{i)Q'{i,3) - K(j)Q'(j,i)| 

< \N{i,3) - K{i)Q{i,3)\+Q'{3,i)\K'{i)^^ - K'(j)| < 

The second part of ineqnality (3.1) is immediate. This completes the proof of ineqnality 
(3.14); the resnlt now follows from ineqnality (3.12). □ 


In the other direction, making a chain lazier cannot greatly impact the well-covering time. 
This reqnires an intermediate lemma; we give an abbreviated proof, as the details may be 
checked exactly as in the proof of Lemma 3.14: 


Lemma 3.6 (Well-Behaved Covering Set). Define 


'B-b,t = G Sb,t '■ max < !}• 

Q{i,3) 


(3.15) 


Then 


min < T > 0 : V (k, A^) G Sb,t^ Vi G [n], K{i) > 


T 


ti 


= min T > 0 : V (k, A^) G 'JZb,t, Vi G [n], K{i) > ^ f • 
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Proof. Since TZb,t C Sb,t-, it is clear that the right-hand side is at least as large as the 
left-hand side. To prove the reverse inequality, we dehne a map F = (Fi, F2) from (k, N) G 
Sb,t to 71b,t by setting Fi{n) = k, F2{N){i,j) = for i 7 ^ j and then 

F 2 (At)(i,i) = 1 — '^j^iF 2 {N){i,j). This map sends elements of Sb,t to TZb,t- Also, if 

miuj ^ then min* This completes the proof. □ 

The following result goes in the ‘opposite direction’ from Lemma 3.5: 

Lemma 3.7 (Laziness and Well-Covering Times). Let Q be a ^-lazy reversible kernel with 
stationary measure p, let Id be the identity kernel, letO < a < 1, and let Q' = aQ-|-(l —a)Id. 
Let r„cj T^c well-covering times of Q, Q'. Then for any sequence fi,..., tn, B, 

r4c(ti, ...,tn,B)< Q;“V„c(ti, ...,tn,B). (3.16) 

Proof. Let Sb,t and S'^ denote the pairs {k, N) that satisfy inequalities (3.1) for the kernels 
Q, Q' respectively and let TIb,t and 77^ rp be as in Equation (3.15). We then dehne a bijection 
F = (Fi, F 2 ) from TZb^t to by setting F(k, N) = (k', N') where 

K,'{i) = K{i) 


for all 1 < i < n. 


N'{i,j) = aN{i,j) 

for all i 7 ^ j, and N'{i,i) = 1 — all 1 < i < n. This map is injective, 

and its image is contained in PaB,T- To check that it is in fact bijective, we dehne a map 
F~^ = (Ff^,F 2 "^) from PaB,T to Pb,t by setting F~^{k',N') = {k, N) where K{i) = K{i)' 
for all 1 < i < n, N{i,j) = a~^N{i,j) for all i 7 ^ j, and N{i,i) = 1 — for all 

1 < i < n. It can be verihed that F~^ is an injection and that F o F~^ is the identity. Since 
Fi(k) = n, this implies that 

min |t > 0 :\/{k,N) e Pb,t, Vi G [n], K{i) > 

= minjT > 0 : V (k. At) e Vi G [n], ^(i) > 

By Lemma 3.6, this implies 

rwc(tl) • • • ) fo, oF) ^ '?wc(tl) • • • ) tn, F). 

Combining this with inequality (3.12) completes the proof. □ 


As mentioned before. Lemmas 3.5 and 3.7 are meant to be simple analogues of the well- 
developed comparison theory for Markov chains [DGJM06]. The bounds in this note can 
already be combined with Lemma 3.2 to obtain at least some bound on the well-covering time 
of any irreducible |-lazy Markov chain, though this bound is often very conservative. For 
example, if the Markov chain exhibits drift towards a small number of states {e.g., the KCIP 
chain in Example 5.1), the associated well-covering time can be much closer to the mixing 
time of the kernel Q than would be suggested by comparison with Lemma 3.2. This same 
strong dependency of our bounds on the stationary distribution of the underlying Markov 
chain occurs for the usual comparison theory as well. 
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4. Stronger Mixing Bounds with Additional Regularity 

We discuss additional assumptions that can give stronger bounds on the mixing time, with 
an emphasis on bounds that are effective before the occupation measures of ‘most’ parts of 
the partition are large. These bounds are most useful when n is large. 

4.1. Drift Bound. One of the main difficulties in using the bounds in [MROO, JSTV04, 
MR02, MR06, MY09], as well as our bounds in Section 3, is their sensitivity to poor mixing 
on sets that have small measure under vr. The simplest way to circumvent this difficulty is 
through a ‘drift condition.’ 

Neither drift conditions nor attempting to ignore sets of small measure when bounding 
mixing times are new ideas; we discuss them here because they are popular and useful in 
the context of this paper, not novel. Drift conditions were famously used in [Ros95] and 
many subsequent papers to derive general mixing bounds for chains. A central part of the 
probabilistic bound on the mixing time given in [BSZll] involves showing that certain sets of 
small measure can (eventually) be ignored, and [K015] explicitly discusses this issue in the 
context of path-coupling arguments (see [BD97]). The literature on ignoring sets of small 
measure when proving Poincare and log-Sobolev inequalities seems smaller (however, see 
[Sch02] for one example). 

Fix constants 0<a<l, 0<6< oo, and k & N and let R : D —)■ be a function that 

satisfies the drift condition 

E[V{Xt+k)\Xt] < (1 - a)V{Xt) + b (4.1) 

and has maXa,gQR(a;) = Rnax < oo. Inequality (4.1) is a special case of the popular drift 
condition used in [Ros95], and the function V is often called a Lyapunov function. Define 
the sets 

C{C) = {ueQ : V{oj) < C}. (4.2) 

We have: 


Theorem 3 (Decompositions and Drift Condition). Let K be a transition kernel with state 
space D and let V,a,b,k satisfy inequality (4.1). Fix ^ < M < oo and let fl' = C{M). 
Finally, let be the mixing time of the trace of K on Q'. Then the mixing time Tmw of K 
satisfies: 


Tmix Y 


16c. 


,16 


max(—rC^, k log(16I4iax), 8 log(16)). 
oa c' 


Proof. The proof is given in Appendix B. 


□ 


4.2. Regularity and Contractivity Assumptions. One of the main contributions of 
[JSTV04] was the use of regularity assumptions to strengthen their bounds. In this section, 
we consider one useful and strong assumption that has been satisfied in practice (see e.g., 
Lemma 4.5 of [DLPIO]) Our assumptions in this section are closely related to the notion of 
metastability] see e.g. the very recent [Zhal5] for bounds on the spectral gap and log-Sobolev 
constants of metastable chains that are useful in similar situations. 

^See Corollary 1 and Lemma 2.6 for a simple bound based on a regularity condition that looks more 
similar to the bounds in [JSTV04]. 
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(4.3) 


Define a less lazy version of K by setting 




Kj^j) 

2{1-K{z,z)) 


for i ^ j and i^LL(*D) = 1 - Ejyi^LL(b j)- 

For any pair of measnres /i, i/ on a metric space {X, d), denote by n(p, u) the collection of 

all pairs of random variables {X, Y) G that have marginal distribntions X = fi, Y = u. 
Recall that the Wasserstein metric on measnres on a metric space {X, d) is given by 


Wd(ti,u)= inf E[d(X,Y)]. 


Recall the escape time Tj^esc from (2.15). 


Definition 4.1 (Contraction Condition). Let d be a metric on [n]. For Xq = x E Lli, let 
be the distribution o/P(Xt-._^^J G [n]\{i} and let ^x{-) = that the kernel 

K satisfies a contraction condition with coefficients 0<fi<a<l if 


max Wdffix, Ly) < (1 - a)d{i, j) + /3. (4.4) 

xGQi^yGQj 

The parameter ^ < 1 plays a role in Theorem 4 similar to the role of the regnlarity 
parameter 7 in Theorem 1 of [JSTV04]. The pnrpose of the Dehnitions (4.3) and (4.4) is 
to allow ns to conple a snitable sped-np copy of the fnnction {V{Xt)}t>o to a Markov chain 
{Zt}t>o evolving according to ICll so that d(V{Xt), Zt) is often small; this is made precise 
in ineqnality (4.7). 

We obtain the following bonnd for Markov chains satisfying (4.4): 


Theorem 4. Let K be a Markov chain on state space D = and let d be a metric on 

[n] that satisfies 1 < d{i,j) < < 00 for all i j■ Assume that K satisfies inequality 

(4.4) for some 0 < /5 < | < | and that it also satisfies 

minP3.[ri,esc > ai^p^^\og{n)] > 

x£Ui 

maxPj,[ri,esc > a2</3max log(n)] <1-52 

xGfli 


for some 0 < oi, 02 , 5i, 62 - Then the mixing time Tmix of K satisfies 

C 3 


T-mix < Cl log(n) max(C 293 + 1, 


log(l - a) 

where Tp is the mixing time of the kernel TCll defined in (4.3), Ci = 

))~\ 7 = I - C 2 = log 2 ( 7 ), C 3 = log(^) + log(Dmax) and e = 


(4.6) 

C 26 ff log(16)(log(l- 

- 1 _ T1 
■4 16' 


Remarks 4.2. We point out that this result is easier to apply than it might appear at first 
glance: 

• Since 01,02 are arbitrary (and can depend on n), an inequality of the form (4.5) will 
be satisfied for any ergodic Markov chain on a finite state space. 

• The popular Total Variation distance is in fact a Wasserstein distance. Thus, In¬ 
equality (4.4) will also be satisfied by all sufficiently large powers of any ergodic 
Markov chain K on a finite state space. See Example 4-3 for a general approach to 
proving such inequalities for more natural metrics and with k = 1 . 
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Proof. We begin by constructing a coupling of the Markov chain {Xt}t>o, with state space 
n, to a Markov chain evolving according to i^LL on state space [n]. 

Fix Xq = X, let = 0, and dehne inductively = niin{t > : V{Xt) ^ 

V{X (s) )}. For t E N, dehne Yf = V{X (t) ). Let {rjt}t>o be a sequence of i.i.d. geometric 

"^exit "^exit 

random variables with mean 2, let = min{j > 0 : h* ^ s}, and for <t< A*^®’''^\ 

dehne Yt = Y^. We note that {Yt}t>o is not a Markov chain. 

Denote by a Markov chain on [n] evolving according to the kernel i^LL and started 

according to the distribution 7f[i] = 7r(Di). By the assumption made in Equation (4.4), it is 
possible to couple {Zt}teN so that 

E[d(y)+1, Zt+i)\Yt, Zt] < (1 - a)d{Yt, Zt) + /3. (4.7) 

Under this coupling, for any t eN, 


F[Yt = Zt]>F[d{Yt,Zt)<l] 

>l-E[d(y),Z,)] 

>!---(!- «)'/^max. (4.8) 

a 

Fix any subset / C [n] satisfying 7r(Ujg/Di) > | § and let r/ = min{t >0 : Yt E I}. 

Then for any starting points Yq = y,Zo = z and any T > max(j^ log(^), 
have by inequality (4.8) 


P[r/ < T] > F[Yt E I] 

> F[Zt eI]- F[Yt ^ Zt] 


> 

> 



Since this holds uniformly over initial points Yq, Zq, we have for fc G N 


maxP„[r 7 - > k —] < e 

ye[n] 7 ' 


(4.9) 


Let Tj = min{t > 0 : Xt G Uig/Dj}. Combining inequalities (4.5) and (4.9), we have for 
G M 


. , 16T a2(Pmax , / M t™ r_ , 4T, ^ ^ i\k—]) , 16T a2'^r. 

maxF Jtj > k ---log(n) < max Pa, r/ > k — + max Pa, T;. ^ > k -r- 

02 xGQ 'y xGQ 


7 


■log(n)] 


1 I 4fcT I 

< e ^ + e 1 7 1 


< 2e 


-k 


(4.10) 


where the second-last line follows from standard concentration inequalities for i.i.d. sums of 
geometric random variables. For C G N, let 

'^I,COV (C) = min{t > 0 : ^ ly^ei > C} 
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be the first time that Xt has entered Ujg/f2j at least C times. By inequalities (4.10) and 
(1.8), we have 


maxEa;[r/,cov(C)] < log(n)C'. 

xGfi y 02 


(4.11) 


By inequalities (4.5) and (1.9), 


P3,[Vi e I, Ki(r/,cov(C)) < 


e(pr 


(1) , 6 


log(8n)] < maxP 3 ,[rgyt < 


d xen c' 


= (1 - minP3,[ri,esc > 


■ log(8n)] 


c 


e(p^ 


xGD, 

r-^i. 


log(8n)]) 


c 


< (l-(5i“i^' ) 


c 


(4.12) 


for all C* G N. 

Combining inequalities (4.11) and (4.12) with Markov’s inequality and setting C = log(16) (log(l — 

r_8e 1 

)) \ for all t > log(n)C, we have that 


maxPa.[Vi G /, Ki{t) < ^^^log(8n)] < 

x&Q 5 

Since this applies for all / C [n] that satisfy 7r(Uig/fli) > \ ~ ^ \ ~ |(| ~ „)) fhe result 

now follows from Lemma 2.2. □ 


Example 4.3. The constants a, {3 associated with inequality (4.4) are generally very poor 
for any partition of f2. However, in some situations, a trace of the Markov chain onto a set 
with large stationary measure will satisfy inequality (4.4) with much larger constants. We 
give a prototypical example for which this small trick is useful, beginning with a discussion 
of why the trick is needed. We leave the proof of all of the claims made in this example to 
Appendix C. 

Fix integers i,m > 2, let H = {0,1, 2,..., 2£— I}’” be the m-dimensional torus with 
side length 2i, and let Q be the proposal distribution 


Q((xi, . . . , Xm)) iyi) ■ ■ ■ ) J/m)) 


1 

3m 


m 

i=i 


where addition is taken in the group Z^. Dehne the function FT on H by 


Next, £x C > 1, let 


m 

H{xi ,..., Xm) = ^ min(a:i, 2i-l - Xi). 

i=l 


tt(x) oc 


(4.13) 


be a distribution, let K be the kernel of a Metropolis-Hasting Markov chain with proposal 
kernel Q and target distribution vr, and for z C [m] (we allow z = 0 as well) dehne 

12^ = {x G 12 : di ^ z, Xi < i — 1] di & z, Xi > i}. 
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Thus, r2 = U2c[m]^2- Finally, let d{z,z') = I^Az'l be the usual Hamming distance on subsets 
of [m]. We are interested in the mixing of the above Markov chain when A;, i and C > Q are 
held constant and m goes to inhnity. 

We give an informal argument that inequality (4.4) cannot be satished with useful con¬ 
stants. Let {Xt}t>o, {Lt}t>o be two copies of the Markov chain started at points points 
Xi = {i- e ^0 and yi = {i- !)!*>- G ^0. For 2 ; G Xq, Yq, IP 2[r0,esc = 1] ~ and if 

Xi, Yi ^ r20, they must be in different partitions. Thus, inequality (4.4) cannot be satished 
for any /3 ^ By standard arguments concerning the contraction of simple random walk 
on the hypercube (see Example 8 of [OHIO]), inequality (4.4) cannot be satished for any 
a S> These constants do not satisfy the conditions of Theorem 4. 

In this example (and many others), the constants can be substantially improved by taking 
a trace of this chain. For 0 < k < £ — 1 and a subset z C [m] , dehne 

= {x G O : \/i ^ z, Xi < i — 1 — k; \/i & z, Xi > £ + k}. 

Set ■ Let K be the transition kernel of the trace of K on We show 

that any hxed £,k > 2, inequality (4.4) is satished for the kernel K with constants 

« = (1 - ^)(1 + 0(1)),/? = 0(1) 

for C > 6 as m goes to inhnity (see Equation (5.20) in Appendix C). Here C is the constant 
appearing in Equation (4.13). We also prove 7r(r2(^)) = 1 — o(l) for C > 6 as m goes to 
inhnity (see Equation (5.24)). As shown below, these constants are good enough to be useful. 

We now show how Theorem 4 can be applied to our example as m goes to inhnity for 
hxed £>3,l<k<£ — 1, and for 6 < C < 00. We show that for m sufficiently large this 
example satishes the conditions of Theorem 4 with constants 


a = 1 




2 m 




_i« _ = (0,0, ..., 0)]log2(e) ^ ^ 

( 1 \ — 10(22 — ^ 1 , 


= 82 = ^ < 2mlog(m), T = 8mlog(8m). 


(4.14) 


From Theorem 4, we conclude 

Tmix = O(mlog(m)E[r0,esc|A'o = (0,0,... ,0)]). 

This is a reasonable estimate. Indeed, by considering the escape time from any part of the 
partition, it can be verihed that mE[r0,esc|A'o = (0, 0,..., 0)] = Thus, our estimate is 

oh by at most a factor of log(m). By inequality (5.32), m^~‘^ = O(E[r0^esc|-Y^o = (0, 0,..., 0)]), 
and so this factor of log(m) is small relative to the mixing time. 


5. Applications 

In this section, we apply our results to two Markov chains, illustrating some situations 
under which our bounds work well. 
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5.1. Pince-Nez Graph. We carefully study the symmetric random walk on 2m-vertex 
“pince-nez” graph mentioned in the Introduction. Fix m G N and dehne = [m], f22 = 
{m-|-1, m-|-2,... ,2m} and = f2iUf22- This was also the partition considered in [JSTV04]. 
We will show that its mixing time (and thus its relaxation time) is O(m^). 

For X ^ y and x,y E or x,y ^ 112, we set K{x,y) = | if |x — ?/| = 1 or {x,y} G 
{{1, m}, {m + 1, 2m}}. We also set iF(l, m -f 1) = iF(m -|- 1,1) = |. For all other x ^ y, we 
set K{x,y) = 0. To complete the dehnition of the kernel, set K{x,x) = 1 — 

We claim that Tmi v = O(m^). We will prove this using Lemma 2.1. By Example 10.20 of 
[LPW09], 

</3max = 0(m2). (5.1) 


For all T G M, 


maxPa;[r/i\ > T -|- 1] < (1 


- minP,j,[r{m+i} < T]) 

0 x£Q2 


5 

6 


- max Pa, [r/m+ir > Tl < — I —max 
6 xe02 J-6 6 xe02 


T 


(5.2) 


By Example 10.20 of [LPW09], maXx&n 2 ^x[T{m+i}] = maXa,gOa Ea,[r{i}] = ©(m^), and so 
inequalities (5.2) and (1.8) imply maxa;en Ea;[r{i}] = O(m^). By the symmetry of the problem, 
for all T G N 

Ei|'ti(r)l > (5-3) 

This implies that, for all T G N 

mmEa,[fi:i(T)] > ^ minEa,[(T - r{i})] = ^ - ©(m^)- 

Applying this bound and Markov’s inequality, we conclude that for all Ci > 0, there exists 
G 2 > 0 so that 

maxP[fi:i(G2m^) < Gim^] < (5.4) 

By the symmetry of the problem, 

m^P[K2(G2m^) < Gim^] < - (5.5) 

for the same Ci,C 2 - The conclusion that Tmw = O(m^) follows immediately from applying 
Lemma 2.1 with bounds (5.1), (5.4), (5.5). 

This bound on the mixing time immediately implies that the relaxation time of the walk 
is O(m^) as well. It is straightforward to check {e.g., by the central limit theorem) that the 
mixing time Tmi v of the walk satishes = Oi rmw )- From the symmetry of the problem and 
the fact that the relaxation time of simple random walk on the cycle is ©(m^), the relaxation 
time of this walk is also at least on the order of m^. 

This example is, of course, simple enough to be analyzed directly. We include it to illustrate 
the fact that we can obtain qualitatively better bounds than previous decomposition bounds. 
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5.2. Toy Version of the Kinetically Constrained Ising Process. Our interest in de¬ 
composition bounds was motivated by our study of the following kinetically constrained Ising 
process, which originated in [AF84]: 

Example 5.1 (Kinetically Constrained Ising Processes (KCIP) and Partitions). Fix a con¬ 
stant c > 0, a graph G = (y,E) with \V\ = m vertices and a function N : G 2^. 
A KCIP associated with graph G and density p = ^ is a Markov chain on O = 

{0,1}‘^\{(0, 0,..., 0)} that has a transition kernel dehned by the following algorithm for 
constructing Xt+i from X*: 

(1) Choose a vertex u G V and a number A G [0,1] uniformly at random. 

(2) If there exists u E V such that u G N{V) and Xt[u] = 1, set Xt+i[v\ = 1 if A < p 
and set Xt+i[v] = 0 if A > p. Set Xt+i[w\ = Xt[w\ for all w ^ v. 

(3) Otherwise, if Xt[u] = 0 for all u G V such that {u,v) G E, set Xt_|_i[r(;] = Xt[w] for 
all w eV. 

It is natural to try to analyze this Markov chain by partitioning according to the number 
of non-adjacent particles, hxing n and dehning for 1 <k <n 

= {X G {0,: Y,X[v] = k,Y, E (5.6) 

v^V u£GvGN{u) 


as well as the ‘remainder’ Qk- Here n is taken to be 0(1). This is because, in 

the regime p = ^, the stationary distribution is concentrated around flk for k = 0(1). 

In [PS16], we show that for G = the 3-dimensional lattice on m = vertices 
and N{v) = {u : {u,v) E E} the usual neighborhood of a vertex, the mixing time of 
the restricted kernels Ki of this chain are 0{m3) and the mixing time of the projected 
kernel K is 0(m^log(m)). Using Lemma 2.1 of this paper, along with calculations sim¬ 
ilar to those in Theorem 3, Lemma 2.6, and Corollary 1, we obtain a mixing bound of 
0(max(m3, log(m))) for the KCIP chain on 

We now give a toy version of this process and explain why the methods in this note improve 
upon existing bounds. Our toy process has a laziness parameter d > 1 and a size parameter 
m G N; for hxed d, we consider the asymptotics of the mixing time as m goes to inhnity. 
For each m G N, the state space of the model is O = ■ i E [m\,j E {1, 2, 3}} and we 

consider the partition O, = {(i, 1), {i, 2), {i, 3)}, 1 < i < m, of O. The transition kernel K is 
given by: 

X((z, !),(* + 1,1)) = X((p 1), (p 2)) = I K((z, 1,1)) = 1, 

6 3 

K((z, 2), (^, 1)) = K((z, 2), (z, 3)) = K((z, 3), (z, 2)) = ^, 

where the hrst and third expressions assume that i < m and z > 1 respectively. For all other 
(A,ji) 7 ^ (* 2 ,j 2 ), X((zi, ji), (z 2 , 42 )) = 0. Finally, we set K{x,x) = 1 - This 

completes the dehnition of the transition matrix. It is immediate that 

Pi = 
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Denote by {X* = (Xi[l],Xi[2])}iep} a Markov chain driven by K. We claim that, for any 
e > 0 and m > N{e) sufficiently large, 

maxE[e2^t+E™i+'^[^] |Xt = x] = 0(1). (5.8) 

To prove this, let Diower = {(b 1) • 1 < f < 'i^} and let be the trace of on 

Diower- Under the identihcation (i, 1) i—)■ i of Diower with [m], {WjsgN has transition kernel 

.^lower(h^ T 1 ) X) X'iower (^0 1 ) TT) -^lower(^O) Ti 
0 6 2 , 

for i 7 ^ {l,m}. We can directly compute that {WjsgN satishes 

E[g|n+i|y^] < o.98e5^^ + 0.25. 

Let Kiower(T) = |{t < s < t + T : Xg G Diower}- By Corollary 2.11 of [Paul5], Kiower(em^+'^) = 
Q{em) as m goes to inhnity. This implies 

E[e5^‘+™i+Ui]|xj = 

< (0.98 + o(l))®(^”^)e^^‘ + , 

- ^ ^ 1-0.98 

which proves inequality (5.8). 

Fix O > 0. We consider the trace of {XjjtgN onto D' = As each part D* of the 

partition of D has three states, it is possible to check by direct computation that with the 
choice c = A all of the constants 6, e, D in the conditions of Corollary 1 and Lemma 2.6 are 
0(1) as m goes to inhnity for this trace (where the implied constants depend on C). Thus, 
by Corollary 1 and Lemma 2.6, the mixing time of the trace of onto D' = 

is at most = 0{mf‘). By Theorem 3 and inequality (5.8), this implies that the mixing 
time Tmix of {Xtjtgj^ must be at most Tmix = 0(m^’''‘^ log(m)). This also immediately implies 
that the relaxation time r^ei = 0 (m^’'''^ log(m)). 

We compare this bound to the bounds achievable by [JSTV04]. Recall that their projected 
chain K is given by Equation (1.7), while their restricted chains are given by 


Ki{x,y) = K{x,y)ly^ni 

ior X ^ y and Ki{x,x) = 1 — J2yeni ^Bis example, the m restricted chains have 

three points and the projected chain is an m-state, (1 — birth and death chain 

corresponding to random walk on the path with constant drift. 

These calculations let us compute the bound given by Theorem 1 of [JSTV04]. Using 
the trivial bound 7 < 1 on the correction term, these bounds imply that this walk has 
relaxation time Trei = which implies that Xm^ x = 0(m^‘^’''^ log(m)). For d large, 

the difference between our bounds and those in [JSTV04] is substantial. The discrepancy 
between the mixing bounds stems from the fact that our two main bounds, on the number 
of steps required to establish the drift condition inequality (5.8) and on the mixing time 
(5.7) of the projected chains, are added to obtain our hnal bound on the mixing time, while 
the corresponding bounds must be multiplied to obtain the hnal bound in [JSTV04]. We 
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emphasize that the discrepancy between our bounds remains large even if we restrict our 
attention to the mixing time of the trace of {Xtjtgis} onto hi' = ^ 


5.2.1. Discussion of Other Interacting Particle Systems. The difficulties illustrated in the 
toy example in Section 5.2 apply to more realistic interacting particle systems, including the 
KCIP models defined in Example 5.1. We briefly sketch the problem, as complete proofs 
would be long. Fix a sequence of connected graphs {GnlneN with \Gn\ = n and maximum 
degree deg(Gn) < d for fixed 0 < d < oo; also fix a constant 0 < c < oo as in that example. 

Let K be the transition kernel given in Example 5.1, and let K be the projected 

and restricted kernels associated with K and the partition given in Equation (5.6). Let 
1 — Aj, 1 — A and 1 — A be the spectral gaps of Ki, K and K respectively, and let vr be 
the stationary distribution of K. The calculation in Lemma 4.1 of [PS16] implies that 
7L(1, 1 ) = 1 — and 7r(l) = 0(1), which in turn implies A = 1 — 0(n~^). Next, note 

that Ki{x,x) = 1 — 0(n“^) for all x G fli. In the special case that 

M{v) = {u : {u,v) e E}, (5.9) 

we have Ai = 1 — 0{n~‘^ max(n“^, (1 — A„))), where 1 — A^ is the spectral gap associated with 
the random walk on Grf. Finally, the correction term 7 dehned in Equation 21 of [JSTV04], 


7 = max max > K(x,v), 


satisfies 7 “^ = O(n^). 

Thus, the best possible bound obtainable by Theorem 1 of [JSTV04] in the special case 
(5.9) is )• Even this bound is rather optimistic (for more complicated reasons, 

it is also not possible to obtain a bound better than = O(n^) using a decomposition 
of the form (5.6)), but is already quite far from the truth. For the main example studied 
in [PS16], the correct answer is at most = 0(n^log(n)), while this optimistic bound 

would give = 0{n^). Similar difficulties occur outside of the special case (5.9) (see, 
e.g., [CFM14] for the correct relaxation time for a KCIP that does not satisfy (5.9) ). 
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Appendix A 


Let {Xt}t>o be a |-lazy, irreducible, reversible Markov chain on a finite state space hi with 
transition kernel P, and let {Yt}t>o be the Markov chain with transition kernel Q = |(Id+P) 
and initial point Yq = Xq = x. Fix a subset A C hi and let = min{t > 0 : G A] and 

= min{t >0 : G A}. Finally, let Tmix denote the mixing time of Xt- Theorem lA of 

[PS13] says that there exist universal constants da,d'a s^^h that 

max E 2 [r^] < Tmix < max (5AO) 

z,A:7r(A)>a ^,A:7r(A)>a 

whereas our formulation in (2.2) bounds Tmix in terms of r^. The following simple result will 
be used to show that our inequality (2.2) is equivalent to Theorem 1.1 of [PS13]. 

Lemma 5.2. For any A C hi, there exists a universal constant C that does not depend on 
P, A or VL such that 

E[rA] < E[r)i] < 8E[rA] + C. (5A1) 

Proof. The lower bound in inequality (5.11) is trivial. To prove the upper bound, we intro¬ 
duce a version of {F)}t>o on an augmented state space as follows. Let {At}t>o be an i.i.d. 
Bernoulli(i) sequence, and then construct {Yt\t>Q by drawing from the kernel: 

P[11+iG-|A, = 1,F,] = P(F„-) 
p[y)+iG-|A, = o,y)] = 5y,(-). 

This construction of {Yt}t>o has transition kernel Q = |(Id-|-P). Let Nt = let 

Mt = min{s >0 : Ng = t}. We then construct {Xt}t>o by setting 

Ai = YmP, 

this construction of {Xt}t>o yields a Markov chain with transition matrix P. We then have, 
for all 0 < a < 8 and all f G N that 

[t'j^ > 8t + a} C {ta > f} U {Nst < t}, 

and so 

P[r)i > 8t + a] < F[ta > t] -h F[Nst < t]. 

Summing this over t and a, we have 

7 oo 

= >8t + a] 

a=0 t=0 
7 oo 

<5^5^(P[rA>t]+P[iV8t<t]) 

a=0 t=0 

oo 

< 8E[rA] + 8 ^P[iV8i < t]. 
t=o 

Since < 1^] < cxo by Hoeffding’s inequality, the upper bound in inequality (5.11) 

follows. □ 


32 


Since max^^yicn E(rA) > 1, from (5.11) it immediately follows that 

maxE[rA] < maxE[r)i] < C'^maxE[rA] 

for some universal constant C > 0. This in turn combined with (5.10) immediately implies 
that 


c'„ max E^[ta\ < r^ix < max E^Ita] 

z^A\'K{A)>a. z^A\'K(A)>a 

for some universal constants Cq,,c'„ as claimed in Equation (2.2). 

We also prove inequality (1.9) from the introduction: 

Proof of inequality (1.9). Let T = max{t G N : maxa^gn P 3 ;[rA > t] > e~^}. Then Markov’s 
inequality gives 

maxEa;[rA] > T maxPa;[rA > T] > e~^T. 

Combining this with inequality (1.8), we have for all x G fl and fc G N, 

Ex[ta > kemaxEy[TA\] < Ex[ta > kT] < e~^. (5-12) 

yen 

This immediately implies the desired inequality. □ 


Appendix B 


We prove a series of standard technical lemmas, leading to the proof of Theorem 3: 


Lemma 5.3 (Drift Implies Concentration). Let K he the transition kernel of a Markov chain 
satisfying the conditions given in Theorem 3. Then 


7r(£(M)) > 1 



Proof. Let {XijtgN be a Markov chain with transition kernel K, started at stationarity, he., 
Xq ~ TT. Since Xk then has distribution tt as well. 


7r(C)=E[E[C(Afc)|Ao]] 

< E[(l - a)V{Xo) + b] 

= (1 — a)7r{V) + b. 

Thus, 7r(C) < ^ and so by Markov’s inequality. 

Since 4a/6 < M, 7t{C{M)) > 1 — ^ > 3/4 and the proof is hnished. □ 

Lemma 5.4. Fix 0 < /3 < 1 and 0 < 7 < cx). Consider a stochastic process {WjteN with 
associated filtration 3Ft that satisfies the drift condition 


E[X,+,\Xs]<{l-f3)Xs + 7 
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for all s G N. Let Zo,Zi,... be an i.i.d. sequence of random variables with geometric 
distribution and mean Then, if Xq < ^, we have for all C, T G N that 

T c 

s=0 i=l 

Proof. Assume Xq < ^ and let Tret = min{t >0 : Xt < ^}. Then for all s > 2, 

= E[E[XslT-,gt>s| 5_l]] 

<E[((1-/?)X,_i + 7)1.„,>,] 

= El((l - + 7)lx„.>J 

<E1((1-|)A._i-0+7)1,„.>,] 

< (1 - |)El.Y,_,lx...>J. 

Iterating this inequality and noting that E[Xi1t-^^^>s] < (1 — (3)^ + 7 < (l — f)^, we have 

E[A.1„..>.] < (l-f)’y, 


and so 

P1t„. > s] < P|A,lx...>, > ^] < (1 - |)‘. (5.13) 

Dehne to = 0 and p+i = min{s > U : Xg < The fact that inequality (5.13) holds 
uniformly in Xq implies that 

P[fi+1 -ti> s|{tj}j<i] < (l - ^y. (5.14) 

Then 

T C 

<C\< P|io > r] = pE(*‘ - ‘‘-i) > ^1- (5-15) 

s=0 i=l 

Inequality (5.14) implies that the distribution of (fj+i — tf) is (conditionally on {tj}j<i) 
stochastically dominated by a geometric distribution with mean combining this with 
inequality (5.15) completes the proof. □ 

We apply this to show that the set C{C) dehned in equation (4.2) has moderately large 
occupation measure for C sufficiently large: 

Corollary 5. Let {W}tGN a Markov chain satisfying the conditions given in Theorem 3 
and let C be as in equation (4.2). For fixed Ci > ^, 0 < (^2 < | and any starting point Xq 
and time U eN, 

u 

P[^ IxtGCiCi) < C2U] < + e“^. 

t=o 
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Proof. Let Xstart = min{t > 0 : X* G C{Ci)}. By inequality (4.1), we have for any Xt 
satisfying V{Xt) > Ci > ^ that 

¥.[V{Xt+k)\Xt] < {l-a)V{Xt) + h 

<{l-^)V{X,)-^V{X,) + h 

<{l-^)V{X,)-h<{l-^)V{X,). 

By Markov’s inequality and the trivial bound that V) < Idnax for all t G N, this implies 

PlTstart > kt] < P[ 14 t 

<4"max(l-|)*. (5.16) 

Fix T < 17 G M and let be an i.i.d. sequence of random variables with geometric 

distribution and mean ^ . By inequality (5.16), the Markov property and Lemma 5.4, 

u u 

> C 2 U] > p[y^ lxte/:(Ci) > C'2l7|rstart < 7’]P[rstart < T] 

t=0 t=T 

T U 

= P[T-start < T] lx,ez:(Ci) > C2f^kstart = t]P[T-start = ^kstart < T] 

t=0 s=t 

1 T ^ 

> (1 - 14iax(l - -a) ^ - t]P[rstart = t|rstart < T] 

t=0 i=l 

1 T 

> (l-K.a.(l--a)^^V[^Z, < f/-T]. 

Choosing T = [yj, we have for C 2 > ^ that 

pE1m(co >^1 > (l-K,axe-^LSJ)(l-e-w). 
t=o 

The second part of the above inequality is standard concentration inequality for geometric 
random variables. This completes the proof. □ 

Theorem 3 now follows immediately from Lemmas 2.1 and 5.3 and Corollary 5: 

Proof of Theorem 3. We apply Lemma 2.1, choosing in the notation of that lemma [n] = 2, 
= £(M), 112 = I = {1}, a = |, and /9 = |. Lemma 5.3 implies that this choice of 

I and (3 satishes the requirements of Lemma 2.1. 

Set 7 = |- In the notation of Corollary 5, choosing 

4 16 

U >T = -max(—rC,^,/c log(1617max), 8 log(16)), 
a c!) 

Cl = M and C 2 = gives 

p|K,(f/) < Xu < 1 

O 
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The result then follows immediately from Lemma 2.1. 


□ 


Appendix C 

We prove the claims made in Example 4.3, with the ultimate goal of applying Theorem 4. 

Lemma 5.5 (Coupling to One Point). Consider a Markov chain {W}t>o with transition 
kernel K and stationary distribution tt on a finite state space O with privileged point z & Q. 
Let T = min{t >0 : Xt = z}. Assume that 

maxE[r|Ao = x] < T, 7t{z) > 1 — e 
for some e < |. Then the mixing time Tmix of K satisfies 

Pmix < \eT] riog( ^|^ -6) ^^' 

Proof. Fix X E fl. Let {Xt}t>o be a copy of the Markov chain started at Xq = x, and let 
{dl}i>o be a copy of the Markov chain started according to the stationary distribution, so 
that Yo ~ tt. Let Tcou = min{t >0 : Xt = Yt} be the collision time of {At}t>o, {Yt}t>o. We 
couple these two chains so that they move independently until time Tcou and satisfy Xg = W 
for all s > Tcoii- By inequality (1.9), we then have for all t G N, 

TiPcoll < t] > Pip < t]P[Xr = Yt-\t < t] 

> (l-e-LiVJ)(i-e). 

By the standard ‘coupling lemma’ for Markov chains (see Prop 4.7 of [LPW09]), 

3 14 

Pmix < min{t > 0 : P[rcoii < t] > -} < \eT] riog( ^^^ _ ^^ )] 

and the proof is finished. □ 

As mentioned before, we are interested in calculating the mixing time in Example 4.3 with 
/c, i and C > 6 held constant and m going to inhnity. Theorem 4 can also be applied when 
C = C{m) —)■ cx) using similar (and indeed much easier) bounds; when C = C{m) = o(l), 
we expect Theorem 4 to become ineffective. In order to apply Theorem 4, we must prove a 
contraction inequality of the form (4.4) and also an occupation inequality of the form (4.5). 

We assume for the remainder of this section that m 3> f'. We begin by proving the 
contraction estimate (4.4). Fix x G let {W}t>o be a copy of the Markov chain evolving 
according to K and started at Xq = x, and let Xcentre = min{t >0 : A* = (0, 0,..., 0)}. We 
begin by comparing Tcentre to r 0 ,esc- For any x G fl 0 ^\{(O, 0,..., 0)}, 

E[H{Xt+,)\Xt = x]< H{x) + (5.17) 

3 3m 

Since \H{Xt+i) — H{Xt) \ < 1 , this inequality implies by the classical ‘gambler’s ruin’ calcu¬ 
lation that 

min P,j,[rcentre < P 0 ,esc] > 1 “ (5.18) 
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For z C [m], let f{z) be the unique point in that satisfies H{f{z)) = 0; for example, 
/(0) = (0,0,..., 0). Let {Xi}i>o, {Yt}t>o be two Markov chains evolving according to K, 
with Xq = X & arbitrary and Iq = (0,0, ...,0). Let be their respective 

escape times. By inequality (5.18), 

||/:(X^M ) - £(y ( 0 ) )||tv < Px[rce„tre > r0,esc] < . (5.19) 

0,esc 0,esc 

The kernel iFLL associated with base kernel K and partition is exactly 

|-lazy simple random walk on the hypercube {0,1}™ (see equation (4.3) for the construction 
of L^ll)- It is straightforward to check via a path-coupling argument (or see the proof 
of Theorem 15.1 of [LPW09] at inverse-temperature /9 = 0) that this kernel satisfies the 
contraction condition 

fT,(ZLL(a:,-),^LL(2/,-)) 1 

max -- <1-• 

x, 3 /g{o,i}’" d[x,y) m 


Thus, by inequality (5.19) the kernel K satisfies inequality (4.4) with constants 

a = l-l-2(— 

m ^ 3 


3 


(5.20) 


We now prove occupation inequalities of the form (4.5). This requires estimates of the 
escape time r^^esc and the mixing time By the same calculations that give inequality 
(5.17), we have for H > 0 that 

= xe fli"^{(0, 0,..., 0)}] < ^ —(e"^ - 1)) 

3 3m 

= ( 0 , 0 ,..., 0 )] < 

3 

Setting D = and iterating, we have 

1 


E[e2-f^(^diogM|x^ = ^] < (1 


)E[eT^(^*-i)i°g(™)|Xo = x] + -e-T^°sim) 
1 ^771j 3 


1 


< (1-F-fc)log(m) ^ log(m) 


12m 


(5.22) 


for all X G ■ Iterating inequality (5.21) again and applying Markov’s inequality gives 

sup Px[rcentre > t] < SUp E^, log(m) log(m) 






<(l- 


12m 


)tg§F-fc)log(m)g-f log(m) 


By Lemma 5.3, it also gives 


vr(/(0)) 

7r(f20) 


> 1 - 96mV^^°s(™\ 


(5.23) 


(5.24) 
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By inequalities (5.23) and (5.24), K(/, satisfies the hypotheses of Lemma 5.5 with T = 
and e = Thus, for sufficiently large m. 


log(2) + £ifM(«-*:-l) ^ 

< 18Cm(£ — k + 1) log(m). 


(5.25) 


The symmetry of the problem implies that (^max = This gives the desired upper bound 
on (^max- To obtain a lower bound on 990 , we note from inequality (5.24) that 

9^0 ^ 3' IE 3 ; [Tcentre] • 

By considering the number of steps it takes to get to (0, 0,..., 0), we have max (k) Tcentre > 
m{£—k—l). By the standard ‘coupon collector’ argument, we also have max^^^(fe) E 2 .[rcentre] > 
^mlog(m). Combining these two bounds with inequality (5.25), 

Tfl 1 

18Cm{£ — fc + 1) > </90 > — max(£ — k — 1,- log(m)). (5.26) 

8 4 

Finally, we point out that the escape time r^^esc is often close to E[r 0 ,esc|-^o = (0,..., 0)]. 
In one direction, the symmetry of the problem and the fact that the Markov chain only 
changes a single coordinate by 1 at every step together imply 


E[r0,esc|^o = 2 : e < E[r0,esc|^o = (0, 0,..., 0)]. 

Thus, for any c G N, inequality (1.9) implies 

E['r0,esc > ceE[r0^esc|-^o = (0, 0,..., 0)] |Xo = 2 ; G < e“L (5.27) 

In the other direction, for all f > 0 we have 

max P[r0,esc < t\Xo = z] < P[r0,esc < t\Xo = (0, . . . , 0)] + max P^[r0,esc < Tcentre]- 

This bound, together with inequality (5.18), implies 

max P[r0,esc < 3 E[r 0 ,esc|^o = (0, 0,..., 0)]|Xo = z]<^+ (5.28) 

o 4 d 

Thus, inequalities (5.27) and (5.28) give for large m 

max P[r0,esc < 3 E[r 0 ,esc|^o = (0, 0,..., 0)]|Xo = z]<^ 

^GO^'=) o 2 

1 (5.29) 

max P[r 0 ,esc > 4E[r0,esc|27o = (0, 0,..., 0)] |Xo = z] < -. 

.GO« 2 

We point out that the expected escape time E[r0^esc|27o = (0, 0,..., 0)] is large compared 
to the mixing time 990 . By the monotonicity of the chain and inequality (5.24), it also gives 

P[T0,esc < t\Xo = (0, 0,..., 0)] < t max P[i7(Xj >£- k\Xo = (0, 0,..., 0)] (5.30) 

0<s<t 
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< t max logM = (q, 0,..., 0)] 

0<s<t 


where the second step is Markov’s inequality. Combining this bound with inequality (5.26), 
we have 


< 64m(pmax|^o = (0,..., 0)] < —. 


m 


(5.31) 


Inequality (5.30) also gives 


,esc < = (0, . . . , 0)] < 


SO 


192m2 

®^0[h3,esc] ^ 


384m2 


^Clog{m) 


(5.32) 


Finally, using inequalities (5.20) and (5.29) and the calculation in Example 12.17 of [LPW09] 
on the mixing time of simple random walk on the hypercube, we can apply Theorem 4 with 
constants 

1 1 

1 ~ , (3 r, Dmax ^ 

2m m-^ 

1« o®^[^0>esc|^o = (0, 0,..., 0)] log2(e) 

Oi = 1602 = 2- 

^1 = ^2 = ^, ^ < 2mlog(m). 


The associated value of T may be taken to be T = 8mlog(8m). By inequality (5.31), we 
have that 01,02 3> 1. This completes the proof of inequality (4.14). 
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